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THE  USE  OF  THE  METHOD  OF  LEAST  SQUARES  IN  CALIBRATION 

by 

J.  M.  Cameron 


1 .  Introduction 

When  more  than  one  measurement  is  made  on  the  same  quantity,  we  are 
accustomed  to  taking  an  average  and  we  have  the  feeling  that  the  result 
is  "better"  than  any  single  value  that  might  be  chosen  from  the  set. 
Exactly  why  the  average  should  be  better  needs  some  justification  and 
the  fundamental  step  toward  a  general  approach  to  the  problem  of 
measurement  was  taken  by  Thomas  Simpson  in  1755.    In  showing  the 
advantage  of  taking  an  average  of  values  arising  from  a  number  of 
probability  distributions,  "he  took  the  bold  step  of  regarding  errors, 
not  as  individual  unrelated  happenings,  but  as  properties  of  the 
measurement  process  itself  ...    He  thus  opened  the  way  to  a 
mathematical  theory  of  measurement  based  on  the  mathematical  theory 
of  probability"  [3,  page  29]. 

The  taking  of  an  average  is  a  special  case  of  the  method  of  least 
squares  for  which  the  original  justification  by  Lengendre  in  1805  did 
not  involve  any  probability  considerations  but  was  advanced  as  a  con- 
venient method  for  the  combination  of  observations.    It  was  Gauss  who 
recognized  that  one  could  not  arrive  at  a  "best"  value  unless  the 
probability  distribution  of  the  measurement  errors  were  known.  In 
1798  he  showed  the  optimal ity  of  the  least  squares  values  when  the 
underlying  distribution  is  normal  and  in  1821  showed  that  the  method 
of  least  squares  leads  to  values  of  the  parameters  which  have  minimum 
variance  among  all  possible  unbiased  linear  functions^ of  the  observa- 
tions regardless  of  the  underlying  distribution.    It  is  this  property 
that  gives  the  method  of  least  squares  its  position  of  dominance 
among  methods  of  combination  of  observations. 

In  this  paper  the  statistical  concepts  needed  for  the  method  of 
least  squares  will  be  stated  as  a  prelude  to  the  usual  modern  version 
of  the  Gauss  theorem.    The  formation  of  the  observational  equations 
and  the  derivation  of  the  normal  equations  are  illustrated  for  several 
situations  arising  in  calibration.    The  role  of  restraints  in  the 
solution  of  systems  which  are  not  of  full  rank  is  discussed.  The 
results  are  presented  in  a  form  designed  to  facilitate  computation. 

*An  example  of  a  nonlinear  function  with  smaller  variance  than  the 
average  (the  "best"  linear  estimator)  is  given  by  the  midrange  for 
the  rectangular  distribution.    The  midrange  (average  of  the  largest 
and  smallest  observation)  has  variance  1/[2(N+1 ) (N+2)]  when  based 
on  n  measurements,  whereas  the  average  has  variance  1/12N.    Thus  if 
N>3,  the  midrange  is  to  be  preferred. 
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2.    The  Physical  and  Statistical  Model  of  an  Experiment 

In  physics,  one  is  familiar  with  the  construction  and  interpretation 
of  the  physical  model  of  an  experiment.    One  has  a  substantial  body  of 
theory  on  which  to  base  such  a  model  and  one  need  only  consider  the 
determination  of  length  by  interferometric  measurements  to  remind 
oneself  of  the  various  elements  involved:    a  defined  unit,  the  apparatus, 
the  procedure,  the  corrections  for  environmental  factors,  etc.  One 
realization  of  the  experiment  leads  to  values  for  the  quantities  of 
interest. 

But  one  realizes  that  a  repetition  of  the  experiment  will  lead  to 
different  values--differences  for  which  the  physical  model  does  not 
provide  corrections.    One  is  thus  confronted  with  the  need  for  a 
statistical  model  to  account  for  the  variations  encountered  in  a  sequence 
of  measurements.    In  building  the  statistical  model,  one  is  first  faced 
with  the  issue  of  what  is  meant  by  a  repetition  of  the  experiment--many 
readings  within  a  few  minutes  or  ab  lYiitio  determinations  a  week  apart. 

The  objective  is  to  describe  the  output  of  the  physical  process 
not  only  in  terms  of  the  physical  quantities  involved  but  also  in  terms 
of  the  random  variation  and  systematic  influences  due  to  environmental, 
procedural,  or  instrumental  factors  in  the  experiment. 


3.    Equation  of  Expected  Values  of  the  Observation 

If  one  measured  the  same  quantity  again  and  again  to  obtain  the 
sequence 

yi.y2.-  .  .     •  •  • 

then  if  the  process  that  generates  these  numbers  is  "in  control,"  the 
long  run  average  or  tiMitlnQ  mean,       will  exist.    By  "in  control"  one 
means  that  the  values  of  y  behave  as  random  variables  from  a  probability 
distribution  (for  a  discussion  of  this  topic,  see  Eisenhart  [1]).  This 
limiting  mean,  y,  is  usually  called  the  expected  vatini  of  y  designated 
by  the  operator  E(  )  so  that  the  statement  becomes  in  symbols  E(y)  =  y. 
Because  y  is  regarded  as  a  random  variable  one  can  represent  it  as 

y  =  y  +  e 

where  e  is  the  random  component  that  follows  some  probability  distri- 
bution with  a  limiting  mean  of  zero,  i.e.,  E(e)  =  0. 

The  quantity  y  may  involve  one  or  more  parameters.    Consider  the 
measurement  of  the  difference  in  length  of  all  distinct  pairings  of 
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four  gage  blocks.  A,  B,  C,  D.  Denote  the  6  measurements  by  y] , 
then  one  may  write 


y6> 


E(yJ 

=  A-B 

ECy?) 

c 

=  A-C 

E(yo) 

=  A-D 

E(y4) 

=  B-C 

ECy^) 

=  B-D 

ECyg) 

=  C-D 

Other  representations  are  useful. 

Observation         Expected  Value:  E(y) 


^1 

H 
H 


A  -  B 

A        -  C 

A  -  D 

B  -  C 

B        -  D 
C  -  D 


Matrix  Form:  Xg 
1-10  0 
10-10 
10  0-1 
0  1-10 
0  10-1 
0     0  1-1 


A 
B 
C 
D 


Consider  a  sequence  of  measurements  of  the  same  quantity  in  the 
presence  of  a  linear  drift  of  A  per  observation.    The  expected  values 
are  thus: 


Observation 
^{^)  =  y 

E(y2)  =  y  +  A 

E(y3)  =  y  +  2A 


E(y^)  =  y  +  (n-l)A 


Matrix  Form:   _X 3 

y 


1  (n-1) 
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There  is  an  alternative  representation  that  measures  the  drift  from 
the  central  point  of  the  experiment  so  that  the  drift  is  represented 
by  .  .  .  -3A,  -2A,  -A,  0,  A,  2A,  3A  .  .  .  for  an  odd  number  of  obser- 
vations and  by  .  .  .  ^SA,  ::3A,  j^A,  A,  3A.  5A  .  .  .  for  an  even  number 
of  observations.         2      2     2    2    2  2 

If,  as  for  example  with  some  gage  blocks,  the  value  changes  approxi- 
mately linearly  with  time;  then  one  can  represent  the  observation  as 
follows: 


Expected  Value  E(y) 
E(y^)  =  a  +  3x^ 
E(y2)  =  a  +  3X2 


Matrix  Form:  Xg 
1       X,  a 
1       Xo  B 


E(y,)  =  a  .  3x„ 


The  sequence  of  measurements  for  the  intercomparison  of  4  gage 
blocks  is  as  follows: 


Observation 


Matrix  Form:  X3 


s.  - 

S.. 

-  7A/2 

1 

-1 

0 

0 

-f 

"  S.  ' 

Y  - 

S. 

-  5A/2 

-1 

0 

0 

1 

-5 

S.. 

X  - 

Y 

-  3A/2 

0 

0 

1 

-1 

-3 

X 

S..- 

X 

-  A/2 

0 

1 

-1 

0 

-1 

Y 

S..- 

Y 

+  A/2 

0 

1 

0 

-1 

1 

A/2 

Y  - 

S. 

+  3  A/ 2 

-1 

0 

0 

1 

3 

S.  - 

X 

+  5A/2 

1 

0 

-1 

0 

5 

X  - 

S.. 

+  7A/2 

0 

-1 

1 

0 

7 

•^8 

(Note  that  for  simplicity,  A/2  is  regarded  as  the  parameter.) 
For  a  detailed  analysis  of  this  and  related  experimental  arrangements, 
see  J.  M.  Cameron  and  G.  E.  Hailes  [1].    The  notation  is  that  used  in 
[1]  where  S.  and  S..  refer  to  reference  standards  and  X  and  Y  are  the 
objects  being  calibrated. 
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If,  as  often  occurs  in  the  intercomparison  of  electrical  standards, 
the  comparator  has  a  left-right  polarity  effect,  this  can  be  represented 
as  an  additive  effect,  a,  as  shown  below  for  the  intercomparison  of  5 
standards. 


Observation 


^3 
^4 


^8 
^9 
^10 


Expected  Value 
A  -  B 
B  - 


■A 
-A 


EM 

+  a 

C              +  a 

C  -  D        +  a 

D  -  E  +  a 

+  E  +  a 

+  D        +  a 

-  D        +  a 

+  E  +  a 

C        -  E  +  a 

C              +  a 


Matrix  Form:  Xg 
1  -1  0  0  0  1 
0  1-10  0  1 
0  0  1-10  1 
0  0  0  1  -1  1 
-1  0  0  0  1  1 
-10  0-101 
0  10-101 
0-10011 
0  0  10-11 
10-1001 


4.    Statistical  Independence 

The  sequence  of  differences  from  a  zero  measurement,  y^, 

A:         y^-yQ.  y2-yo'  ^3-^0-  •  "V^o"  •  • 

are  clearly  dependent  because  an  error  in  y  will  be  common  to  all. 
Similarly,  the  successive  differences 


n  ■'n-l      •  • 


will  be  correlated  in  pairs  because  an  error  in  y^  affects  both  the 
(n-l)st  and  n-th  difference. 
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If  it  is  assumed  in  both  cases  that  each  y^-  has  the  form      =  ui  + 
where  E(ei)  =  0,  Var  (ei)  =      and  cov  (ei.ej)  =  0,then  the  variance  of 
the  differences  for  sequence  A  is,  as  one  would  expect, 

V(y.-yQ)  =  20^ 

and  the  covariance  of  two  differences  is 

cov  (y.-yQ,  y-yo)  =  E[{e.-eQ){e^-eQ)]  =  E(e^)  =  0^ 

because  terms  of  the  form  E(e^.  ,ej)=  0 

For  sequence  B  the  variance  is  also  V(yi-yi.i)  =  2a^  and  the 
covariance  terms  are 


covl 


ro^  if  |i-j|  =  1 


These  variance-covariance  relationships  can  be  represented  in  matrix 
form: 


Sequence  A:  V= 


2  1  1  ...  1 
1  2  1  ...  1 


1  1  1  ...  2 


0^     Sequence  B:  V= 


2-1  0  0  ...  0 
-1    2  -1  0  ...  0 


0   0    0  0  ...  2 


All  are  familiar  with  the  phenomenon  of  much  closer  agreement  among 
measurements  taken  immediately  after  each  other  when  compared  to  a  sequence 
of  values  taken  days  or  weeks  apart.    The  simplest  statistical  model  for 
this  case  is  that  each  day  has  its  own  limiting  mean,  yi  =  y  +  6i ,  where 
E(6-i)  =  0,  Var(6i)  =  a§,  Cov(6i,6j)  =  0,and  the  successive  values  on 
each  day  have  the  form 

y..=y.+e..=y  +  6.+e.. 


where  E(eij)  =  0,  Var(eij)  =  aj,  Cov(e-jj,  ei^^)  =  0,  and  Cov(eij,  6|<)  =  0. 
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These  three  examples  serve  to  illustrate  the  point  that  the  physical 
conduct  of  the  experiment  is  the  essential  element  in  dictating  the 
appropriate  statistical  analysis.    In  all  three  cases  the  correlation  among 
the  variables  vitiates  the  usual  formula:    standard  deviation  of  the  mean  = 
(l//n)  standard  deviation.    (See  Appendix,  Section  1(b).) 

It  is  in  the  physical  conduct  of  the  experiment  that  one  has  to  build 
in  the  independence  of  the  measurements.    For  Sequence  A  one  could  remeasure 
the  zero  setting  each  time  or  in  Sequence  B,  make  an  independent  duplicate 
measurement.    Ordinarily  this  is  too  much  of  an  expense  to  pay  to  achieve 
uncorrelated  variables   just  for  a  simpler  analysis. 

Statistical  independence  is  to  be  desired  in  the  sense  that  if 
the  successive  measurements  are  highly  correlated,  then  many  measure- 
ments are  only  slightly  better  than  a  single  one.    The  really  important 
issue  is  that  the  proper  statistical  model  be  used  so  that  the  results 
are  valid. 


5.    Normal  Equations  For  the  Method  of  Least  Squares  (independent 
random  variables) 

When  there  are  more  observations  than  parameters,  the  "best"  (in 
the  sense  of  minimum  variance)  linear  unbiased  estimates  for  the 
parameters  are  given  by  the  so-called  least  squares  estimators.  For 
example,  assume  one  has  the  problem  of  deriving  values  for  A,  B,  C, 
and  D  from  the  following  measurements. 


Measurements 


Expected  Value: 
A 
B 
C 
D 

A  +  B 
B  +  C 
C  +  D 
D  +  A 


E(y)       Matrix  Form:  X3 
'  1    0    0  0 
0    10  0 


0  0  10 
0  0  0  1 
110  0 


0  1  1 
0    0  1 


10    0  1 
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An  obvious  estimator.  A,  is  the  average  of  the  three  values. 

Expected  Value 

^1  ^ 
y^-y^  (A+B)-B 

y8-y4  (A+D)-D 
so  that,  assuming  independent  measurements  with  variance,  o^, 

A    =  i(yi  +  y5  -  ^2  ^  ^8  -  ^4^ 
Var(A  )  = 

The  least  squares  estimator  is  obtained  by  forming  the  normal 
equations  (see  Appendix,  Section  2). 

3A  +    B  +      +  D  =  y^  +  y^  +  yg 

A  +  3B  +    C        =  ^2     ^6  ^5 
B  +  3C  +  D  =  y3  +  y7  +  yg 
A         +    C  +3D  =  y^  +  yg  +  y^ 

The  solution  gives  the  following  estimators  for  the  parameters. 

A  =  (7y^  -  3/2  +  2y3  -  3y4  +  4y5  -  y^  -  y^  +  4yg)/15 

B  =  (-3y^  +  7y2  -  3y3  +  2)f^  +  4y5  +  4yg  -  y^  -  y8)/15 

C  =  (2y^  -  3y2  +  7y3  -        -  y^  +  4yg  +  4y7  -  yg)/15 

D  =  (-3y^  +  2y2  -  3y3  +  yy^  -  y^  -  y^  +  H^^^  +  4yg)/15 

Using  formula  (1.11)  of  Appendix,  gives 

Var(A)  =  105aV225  =  21aV45  =  7aVl5 

which  can  be  compared  to  the  variance  of  A  which  was  25a^/45.    The  Gauss 
theorem  on  least  squares  guarantees  that  no  other  linear  unbiased 
estimator  will  have  smaller  variance. 
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In  matrix  form  one  has 


(X'X)B  = 


3  10  1 
13  10 
0  13  1 
10  13 


1  0  0  0  1  0  0  1 

0  10  0  110  0 

0  0  1  0  0  1  1  0 

0  0  0  1  0  0  1  1 


15 


7-3  2-3 

-3  7-3  2 

2-3  7-3 

-3  2-3  7 


1  0  0  0  1  0  0  1 

0  10  0  110  0 

0  0  1  0  0  1  1  0 

0  0  0  1  0  0  1  1 


^  "  IT 


7-3    2-3    4-1-1  4 

-3    7-3    2    4  4-1-1 

2-3    7-3-1    4  4-1 

-32-37-1-144 

When  only  differences  among  a  group  of  objects  (such  as  gage  blocks, 
voltage  cells,  etc.)  are  measured  the  normal  equation  will  not  be  of 
full  rank  so  that  a  unique  solution  will  not  exist.    For  the  design 
involving  differences  between  all  distinct  pairings  of  objects  the 
normal  equations  are,  for  the  case  of  4  objects  discussed  in  Section  3, 

3A  -  B  -  C  -  D  =  yi  +  y2  +  ^3  =  ^1 
-A  +  3B  -    C  -    D  =  -y^  +      +       =  q2 


-A  - 
-A  - 


+  3C  -    D  =  -y2  -  y^  +  yg  =  q3 


C  +  3D  =  -y. 
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Or  in  matrix  form: 


X'X3  = 


1110    0  0 
-10    0  110 
0-1    0-1    0  1 
0    0-1  0-1-1 


6  = 


3  -1  -1  -1 

-1    3  -1  -1 

-1  -1  3  -1 

-1  -1  -1  3 


3  = 


1110    0  0 
-10    0  110 
0-10-101 
0    0-1  0-1-1 


1-10  0 
10-10 
10  0-1 
0  1-10 
0  10-1 
0    0  1-1 

which  can  be  seen  not  to  be  of  full  rank  because  the  sum  of  the  four 
equations  is  zero. 

One  needs  a  baseline  to  which  the  differences  can  be  referred--a 
restraint  to  bring  the  system  of  equations  up  to  full  rank.    If  one  of 
the  objects  were  designated  as  the  standard,  or  if  a  number  (or  all) 
of  them  were  regarded  as  a  reference  group  whose  value  was  known,  values 
for  the  items  could  be  obtained. 

If  the  restraint  A  =  Kq  is  invoked,  the  normal  equations  become 
(using  the  methods  of  Appendix,  Section  3) 


3A 


B 


C  -    D    +  X  =  q 


-A  +  3B  -  C  -  D 
-A  -  B  +  3C  -  D 
-A  -    B  -    C  +  3D 


The  solution  is  given  by 
A  =  K 

B  =  K+(-2y^-y2-y3+y4+y5)/4 
C  =  K+(-y^-2y2-y3-y4+yg)/4 
6  -  Kf(-y^-y2-2y3-y3-yg)/4 

A  =  0 


^  ^2 
=  ^3 
=  ^4 
=  K 


3 

-1 

-1 

-1 

1 

-1 

3 

-1 

-1 

0 

-1 

-1 

3 

-1 

0 

-1 

-1 

-1 

3 

0 

1 

0 

0 

0 

0 

3 

X'y 

X 

K 

0 

"o 

0 

0 

0 

4 

1 

1 

1 

0 

0 

0 

0 

0 

2 

1 

1 

4 

-1 

0 

0 

1 

1 

0 

0 

0 

1 

2 

1 

4 

0 

-1 

0 

-1 

0 

1 

0 

0 

1 

1 

2 

4 

0 

0 

-1 

0 

-1 

-1 

0 

4 

4 

4 

4 

0 

0 

0 

0 

0 

0 

0 

1 
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1  0000004 

-2-1-1    1    1    0  4 

-1  -2-1-1    0  14 

-1  -1  -2  0-1-14 

0    0    0    0    0    0  0 

The  variances  of  the  values  are  V(A)  =  0;  V(B)  =  V(C)  =  V(D)  =  aV2. 

If  the  restraint  A+B+C+D=K]  is  invoked,  the  normal  equations 
become 

3A  -  B  -  C  -  D  +  A  =  q-| 
-A  +  3B  -    C  -    D  +  A  =  q2 


-A  -  B  +  3C  -  D  +  A  =  q. 
-A  -    B  -    C  +  3D  +  A  =  q. 


A  +    B  +    C  +  D 


=  K. 


B 

X'y 

A 

—I 

11110 


and  the  solution  is  given  by 

A  =  (y^+y2+y3+K^)/4 

3 

-  1 

16 

3 

-1 

-1 

-1 

4 

1 

1 

1 

0 

0 

0 

0 

B  =  (-y^+y4+y5+K^)/4 

A 

-1 

3 

-1 

-1 

4 

-1 

0 

0 

1 

1 

0 

0 

c  =  (-y2-y4+y6+Ki)/4 

-1 

-1 

3 

-1 

4 

0 

-1 

0 

-1 

0 

1 

0 

3=  (-y3-y5-y6+K^)/4 

-1 

-1 

-1 

3 

4 

0 

0 

-1 

0 

-1 

-1 

0 

A  =  0 

4 

4 

4 

4 

0 

0 

0 

0 

0 

0 

0 

1 

16 


y 


4 

4 

4 

0 

0 

0 

4 

-4 

0 

0 

4 

4 

0 

4 

0 

-4 

0 

-4 

0 

4 

4 

0 

0 

-4 

0 

-4 

-4 

4 

0 

0 

0 

0 

0 

0 

0 

The  variances  of  the  values  are  V(A)  =  V(B)  =  V(C)  =  V{D)  =  3aVl6. 
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Although  it  is  a  simple  matter  to  change  the  reference  point  for 
the  parameters  (i.e.,  change  the  restraint)  after  one  solution  has  been 
found,  the  corresponding  change  of  variances  for  the  parameter  values 
should  not  be  ignored.    These  variances  are  given  by  the  diagonal  terms 
of  the  inverse  of  the  matrix  of  normal  equation,  the  inverse  being 
indicated  by  double  brackets  in  these  examples.    The  difference  in 
variance  for  §  in  the  last  example,  arises  from  the  fact  that  in  the 
first  case  one  is  concerned  only  with  the  difference  between  A  (the 
standard)  and  B,  whereas  in  the  second  case  it  is  the  difference  between 
B  and  the  average  of  the  others  that  is  involved. 

For  completeness,  the  matrices  of  normal  equations  and  their 
inverses  for  the  examples  of  Section  3  are  shown  below. 


Linear  Drift 


X  = 


(n-1) 


(X'X)"^  =  -rjH 


X'X 


n(n-l)/2 


n(n-l)/2 
n(n-l)(2n-l)/6 


F(nMT 


n(n-l)(2n-l)/6  -n(n-l)/2 
-n(n-l)/2  n 


y  a  linear  function  of  x 


X  = 

'l  x/ 

X'X  = 

n  Zx 

■  nzx^-(zx)^ 

Zx^  -Zx 

1  X2 

Zx  Zx^ 

-Zx  n 

^  ^n 
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Gage  block  design 
X  = 


1 

-1 

0 

0 

-7 

-1 

0 

0 

1 

-5 

0 

0 

1 

-1 

-3 

0 

1 

-1 

0 

-1 

0 

1 

0 

-1 

1 

-1 

0 

0 

1 

3 

1 

0 

-1 

0 

5 

0 

-1 

1 

0 

7 

X'X  B 
B'  0 


4 

-1 

-1 

-2 

0 

1 

-1 

4 

-2 

-1 

n 

1 

-1 

-2 

4 

-1 

0 

0 

-2 

-1 

-1 

4 

0 

0 

0 

0 

0 

0 

168 

0 

1 

1 

0 

0 

0 

0 

X'X  B 
B'  0 


-1    =  1 


336 


35 

-35 

-7 

7 

0 

168 

-35 

35 

7 

-7 

0 

168 

-7 

7 

91 

21 

0 

168 

7 

-7 

21 

91 

0 

168 

0 

0 

0 

0 

2 

0 

168 

168 

168 

168 

0 

0 

Intercomparison  of  5  standards    (Sum  of  all  used  as  restraint) 

X  = 


1 

-1 

0 

0 

0 

1 

0 

1 

-1 

0 

0 

1 

0 

0 

1 

-1 

0 

1 

0 

0 

0 

1 

-1 

1 

-1 

0 

0 

0 

1 

1 

-1 

0 

0 

1 

0 

1 

0 

1 

0 

-1 

0 

1 

0 

-1 

0 

0 

1 

1 

0 

0 

1 

0 

-1 

1 

1 

0 

-1 

0 

0 

1 

X'X  B 
B'  0 


4  -1  -1  -1-1  0  1 
-1  4-1-1-1  0  1 
-1  -14-1-101 
-1  -1-14-101 
-1  -1-1-14  0  1 
0  0  0  0  0  10  0 
111110  0 
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X'X  B 
B'  0 


-1 


25 


4 

-1 

-1 

-1 

1 

-1 

4 

-1 

-1 

-1 

-1 

-1 

4 

-1 

-1 

-1 

-1 

-1 

4 

-1 

-1 

-1 

-1 

-1 

4 

0 

0 

0 

0 

0 

5 

5 

5 

5 

5 

6.    Standard  Deviation 

By  substituting  the  computed  values  for  the  parameters  into  the 
equations  of  expected  values  for  the  observation,  one  has  a  pn-tdlttdd 
valu^  to  compare  to  the  actual  observation.    The  difference,  d,  between 
the  observed  and  predicted  value  is  called  the  deviation  and  is  used 
to  determine  an  estimate,  s,  of  the  standard  deviation,  a,  of  the 
process 


s  = 


n-k+m 


where  n  is  the  number  of  measurements,  k  is  the  number  of  parameters  and 
m  is  the  number  of  restraints. 

Ordinarily  one  has  available  a  sequence  of  values  of  the  standard 
deviation  say  si,  S2,  S3,  .  .  .  ,  s^  based  on  v] ,  V2,  V3,  .  .  .  ,  degrees 
of  freedom.    One  forms  the  estimate  of  a  by  combining  these  in  quadrature 


a  = 


with  degrees  of  freedom  N  =  Zv.    In  assigning  a  standard  deviation  to 
the  parameters  or  linear  combinations  of  them,  the  value  a  is  used  rather 
than  the  value  of  s  from  a  single  experiment. 

The  variance  of  the  sums  of  two  parameter  values  is  given  by  adding 
the  corresponding  diagonal  terms  (variances)  in  the  inverse  of  the 
matrix  of  normal  equations  and  the  appropriate  off  diagonal  terms 
(covariances)  and  multiplying  by        For  the  case  of  the  intercomparison 
of  5  standards  given  at  the  end  of  Section  5: 
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s.d.  (A+B)  =  /a^  +  0^  +  2a^g  =  y^S^^^^I^]  = 


a/6 


For  the  variance  of  the  difference,  the  covariance  terms  enter  negatively 
so  that  for  the  same  example 


s.d.(A-B)  =  /a^  +  a2-2a^3  =  -=-/[4+4-2(-l )]  = 


a/10 


For  other  linear  combinations,  formula  1.10-M  of  the  Appendix  would  be 
used. 

For  the  linear  function  example,  the  predicted  value  of  y  for 
^0  ""^  ^o~  ^  ^^^^^        ^  variance  of 


^0^ 

^11  ^12 

"1  ~ 

^12  ^22 

X 

0 

where  the  terms 
Section  5  for 


s  C,-,,  Ci2»  Cpo  ai^e  the  elements  of  (X'X)"^  given  in 
the  wse  of  y  as  a  linear  function  of  x. 


7.    Correlated  Measurements 

In  the  previous  section  it  was  assumed  that  the  observations  were 
uncorrected,  i.e.,  that  V(yi)  =  a^,  cov(yi,  yj)  =0  or  in  matrix  form 
V  =  Var(y)  =  a^I  where  I  is  the  identity  matrix.    Section  4  of  the 
Appendix  discusses  the  general  case  where  one  knows  the  matrix,  V,  of 
variances  and  covariances  for  the  observations. 

Quite  often  a  transformation  of  variables  can  be  achieved  to  obtain 
variables  that  are  uncorrelated.    A  simple  example  is  provided  by  the 
case  of  cummulative  errors,  i.e.,  in  the  case  where 

Yl  =  ^1  +  ^1 

^2  ^  ^2     ^1  ^2 

y3  =  y3  +       +  ^2  ^3 


The  variance  covariance  matrix  of  the  y's  assuming  E(e:i)  =  0,  Var(£)  =  a^, 
cov(e-j£j)  =  0  is  given  by 
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V  = 


1    1    1  .  .  1 

12  2  2 
12    3  3 


1    2    3  .  .  n 
If  one  transforms  to  variables  where 

^1  =  ^1 


X3  =  y3  -  y2 


=  y2  -  ^1  ^2 
=  ^3  ■  ^2  ^3 


X  ~  y  -  y 
n  •'n 


n-1  =  ^n  -  ^-1 


+  e 


The  expected  values  and  variances  become 


E(X)  = 


^n-^n-1 


V(X)  = 


0 


0  0 


In  matrix  form  X  =  Ty  where  T  = 

1 

0 

0  . 

.  0 

0 

-1 

1 

0 

0 

0 

0 

-1 

1 

0 

0 

0 

0 

0 

-1 

1 

.  0 

0 
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and  if  one  computes  Var(Ty)  =  TVT',  one  gets 

Var{Ty)  =     10    0..  Ill 

■110  12  2  2 

0-11  12  3  3 


1  -1  0 
0  1  -1 
0    0  1 
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APPENDIX:    FORMULAS  FROM  STATISTICS 
1 .    Background  and  Notation 
(a)    Expected  Value 

The  expected  value,  y,  of  a  random  variable,  y,  will  be  written 

E(y)  =  y 

The  mean  y  may  represent  a  linear  function  of  some  basic 
parameters  6],  32j  .  .  .  6|<  with  known  coefficients 
XT,  X2,  .  .  .,xk 

E(y)  =  y  =  xi3l  +  y^2^2  +  •  •  •  +  \^)^ 

The  expected  value  of  n  observed  values  y] ,  y^*  • 
then  be  written 

ECy^)  =  x^^3i  +  ^^2^2  +  •  •  •  +  x^i^e,^  (1.1) 
E(y2)  =  X2^3i  +  ^^ih      •  •  •  +  Wk 


^(^n^  =  ^nl^l  '  \zh  ^  •  •  • 
This  may  be  written  in  matrix  notation  as 

'6 


E(y2) 

• 

E(y  ) 
•'n 

E(y) 



^11    ^12  •  •  •  ^Ik 


^21    ^22  •  •  •  ^2k 


3. 


(1.1-M) 


nl     n2  nk 


=  X3 

where  the  vectors  y  and  3  and  the  matrix,  X,  are  easily 
identified. 
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(b)    Variance,  Covariance 

The  variance,  a?,  of  a  random  variable,  y^. ,  is  defined  as 
O]  =  E{(yi  -  Pi)'}  =  E(y.^)  -  2y.E(y.)  +       =  E(yp  -  (1.2) 


and  the  covariance  a.,  of  the  variables  y.  and  y.  by 


(1.3) 


(1.4) 


a.j  =  E{(y.  -  y.)(y.  -  y.)}  =  E(y.y.)  -  y.y. 

The  variance  of  cy  where  c  is  some  constant  is 

Var  (cy)  =  E{(cy  -  cy)^}  =  c^a^ 
The  variance  of  a  sum  of  two  variables 
Var(y^  +  y^)  =  E{[y^  +       -  (y^  +  y^)]^  =  E{[(y^  -      )  +  (y2  -  Mz)]^) 

=  E(y^  -  y^)2  +  E(y2  -  y2)'  +  2E{(y^  -  y^)(y2  -  y2)} 
=      +  a|  +  2a^2  (1-5) 
which  we  may  write  as 


'a]  +  a^2" 

=    [1  1] 

0^2 

_^12  '2_ 

.^12  ^2  _ 

C] 


(1.5-M) 


For  independent  random  variables  a..  =  0  and 

Var(Ey.)  =  Ea^ 


(1.6) 


EXAMPLE: 

Var(ay^  +  by2  +  cy3)  =  E{[(ay^  -  ay^)  +  (by2  -  by2)  +  (cy3  -  cy3)]n 
=  a^a^  +  b^a^  +  c^a^  +  2aba^2     ^300^3  +  2bca23  ^^'''^ 
which  may  be  written  as 

(1.7-M) 


[a  b  c]fa|    0-12  o-^^' 


^12  ^2  ^23 


13    23  "3  J 
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(c)    Linear  Function  of  Random  Variables 
A  linear  function 

L  =  a,y,  +  a^yj  *  .  .  .  *  a„y„ 

has  expected  value 

E(L)  =  a^E(y^)  +  a^Ely^)  +  .  .  .  +  a^E(y^) 

or  in  matrix  notation 


E(L)  =  (ai  a2  .  .  .  a^) 


■E(y^)    =  a'u 

E(y2) 
LE(y^)J 


The  variance  is  given,  by  analogy  with  (1.7)  by 


V(L)  =  [a,  a 


1  "2 


.  a 


n] 


^1  ^12  •  •  •  ^In 
^21  ^2    •  •  •  ^2n 


^nl  ^n2  •  •  •  ^n 


which  reduces  to  the  usual  formula 

,2^2 


V(Za.y.)  =  Ea?a? 


(1.8) 


(1.9) 


(1.9-M) 


(1.10-M) 


(1.11) 


if  a. .  =  0. 

For  two  linear  functions  L,  and  L^,  the  covariance  term  is 
given  by 

^^h^^rh^ '  "  ^n^v^n^^^^i^^r^i^  +  +  ^^{y^-u^n} 

=  a^b^E(y^-y^)2  +  a2b2E(y2-y2)^  +  .  .  .  +  a^b^E(y^-u^)2 
+  (a^b2  +  a2b^)E(y^-y^)(y2-y2)  +  (a^b3  +  a3b^ )E(y^-y^ ) (y3-y3) 
+  (a2b3+a3b2)E(y2-y2)(y3-y3)  +  .  .  . 


21 


This  reduces  to  the  usual  formulas: 

then    Cov  (L-i,  Lg)  =  Za-b-a? 

then    Cov  (L^ ,        =  a^Ea.b^. 


If  a. .  =  0 


(1.12) 


If  a.  =  a 


For  the  case  of  L-j  =  a-iy-j  + 

^2^2  '3^3 

covariance  can  be  written: 

(a^  32  33) 

b^a^  +  b2ai2  + 

Vl3 

^2*^2  ^  '^1^12  ^ 

^3^23 

^3^3  ^  ^^1^13  ^ 

4^23 

^1    ^12  ^13 


^12  ^2  ^23 


^13  ^23  ^3 


(1.12-M) 


giving  the  general  formula  for  the  variance  and  covariance  of  two  linear 
functions 


an  dir\  •  .  .  a 
I    c  n 


^  ^2  •  •  •  ^ 


^12  •  •  •  ^In 


2  02    •  •  •  (72n 


a„    ...  a 

n    2n  n 


■^1  h 


^2  ^2 


a  b 
n  n 


(1.13-M) 


or  in  general  for  p  such  function,  i.e.,  for  a  pxn  matrix  A 

Var(AY)  =  AVA' 
(d)    Quadratic  Forms  in  Random  Variables 
We  have  from  (1.2) 

E(y2)  =       +  y2 


(1.14-M) 


(1.15) 


We  wish  to  extend  this  to  include  the  case  of  a  more  general 
quadratic  expression  in  the  y's,  consider  for  example 

E[(ay^  +  by2)']  =  Ea^y^  +  Eb2y2  +  2abE(y^y2) 
=  a^a^  +  a^y^  +  b^a^  +  b^y|  +  2abuiy2  +  2abai2 
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which  may  be  displayed  as  a  matrix  product  as  follows; 


"a^  ab~ 

W 

"a^  ab" 

ab  b\ 

ab  b^ 

+  [a  b] 


^1  ^12" 


^0^2  ^2  J 


This  example  illustrates  the  general  formula: 
E[(a^y^  +  .  .  +  a^y^)']  =  E^[y^  y2  .  .  y^] 


=  [yi  .  .  y,] 


'In 


.  a 


In 


^1  ^1^2 


a^a 
1  n 


a2ai  82  •  •  ^2^n 


^n^l  ^n^2  •  •  ^n 


+  [a^  .  .  aj 


raf  0^2  •  •  ^In 


^In  ^2n    •  % 


or 


where  A  = 


^1^2 


^1^2  ^2 


E{y'Ay}  =  y'Ay  +  a'Va 
and  V  = 


o\  a^2 


^12  ^2 


{1.16-M) 


The  last  term  can  be  replaced  by  the  trace  of  AV  so  that  we  have 

E(Y'AY)  =  y'Ay  +  Trace(AV)  (1.17-M) 

For  an  excellent  treatment  of  these  statistical  topics  one  should 
consult  Zelen   C5] . 
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2.    The  Gauss  Theorem  on  Least  Squares  (Independent,  Equal  Variance, 
Full  Rank) 

Let  the  n  observations  yi,y2>.  .  .»y^  have  expected  values 
E(y,)  =  x,,S,  +  x,2e2  .  .  .  x,^8|^ 
E{y2)  =  X2,B,  +  X2262  .  .  .  ^2kh 


(2.1) 


and  be  statistically  independent  with  common  variance, 
conditions  can  be  expressed  in  matrix  form  as  follows: 


These  two 


E(y)  = 


E  y^   =  X 


11  ^12 


^2      ^21  ^22 


^n        ^nl  ^n2 


^Ik 


^2k 


^nk 


=  X3 


(2.1-M) 


v(y)  = 


0'  0 


0  o' 

0  0 


=  an 


The  Gauss  theorem  states  that  the  minimum  variance  unbiased  linear 
estimator  of  any  linear  function,  L,  of  the  parameters,  3,  3^  .  .  .  3.  , 
say  '    ^  K 


L  =  a^3^  +  a232  +  . 


•  ^^k 


is  given  by  substituting  the  values  of  3^  which  minimize 
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Q  =  Z[y.  -  (x.^Bi  +  .  .  x.^3k)] 


(2.2) 


considered  as  a  function  of  the  3-.  These  values,  3-, ,  3^  .  . 
the  solutions  to  the  k  equations ,^cal led  the  nomat  iqaazioiu. 


^^il^l 


+    1.x,tX,o3o  +  .  .  +  ^x.^x.|^3,^  = 


'iri2^2 


Zx . ,y . 
1  r  1 


3|^  are 


Zx.„x,,3i  +  Zx?^3 


'i2"iri 


i2^2 


^^i2^•k\ 


Sx . • 


or  in  matrix  form 

(X'X)3  =  X'y 

The  solution  to  these  equations  can  be  written  as 

3  =  (X'X)"^X'y 


^^•kyi 


(2.3) 


(2.3-M) 


(2.4-M) 


because  X  was  assumed  to  be  of  rank  k.    The  matrix  (X'X)"^  is  the 
A^nvzA^t  o{i  the.  mcut/vix       noKmaJi  iiqucutioiU  and  plays  an  important  role 
in  least  squares  analysis.    Let  its  elements  be  denoted  by  c.  so  that 


' v^-l  - 


(X'X) 


^11  •  •  •  ^Ik 


^21  ^22 


.  c 


2k 


(2.5-M) 


^kl    ^k2  •  •  •  ^kk 


The  standard  deviation,  a,  is  estimated  from  the  deviations  d^ , 
where 


^-  =  ^i  -  (^-1^1  '  'izh  ^  '  ^ik^k) 


(2.6) 
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by  the  quantity,  s, 

s  =  /5^i  (2.7) 

and  is  said  to  have  n-k  dagxzd^  o{,  {^fizzdom. 

The  standard  deviation  of  the  values  for  the  coefficients  3^  are 
given  by 

s.d.(3^.)  =  a/c..  (2.8) 

and  for  a  linear  function  L  =  3,31  +  a^^go  •  •  •  a^Bi,  is  [see  equation 
(1.10-M)]  '  '       ^  ^  ^  ^ 


(2.9-M) 
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3 .    The  Gauss  Theorem  on  Least  Squares  (Independent,  Equal  Variance, 
With  Restraints) 

If  the  parameters,  3^-,  are  required  to  satisfy  the  m  linear  equations 


^m      ml  I       m2  2  mk  k  m 

or  in  matrix  form 

B'3  =  K  (3.1-M) 

then  using  the  method  of  Lagrangian  multipliers,  it  turns  out  that  the 
minimum  variance  unbiased  linear  estimators  are  given  by  minimizing 

F  =  Q  +  2A^  (i/;^  -  K^)  +  2X^{ilj^  -  K^)  +  .  .  +  2Xj^l^^  -  (3.2) 

considered  as  ^  function  of  the  3's  and  A's.    (2X-j  is  chosen  rather 
than  just  X-\  so  that  in  setting  9F/33i  =  0,  a  common  factor  of  2  can  be 
divided  out. ) 


This  leads  to  the  normal  equations 


(3.3) 


27 


or  in  matrix  form 


X'X  B 

"3" 

X'y 

B'  0 

X 

K 

(3.3-M) 


and  the  solution  is  given  by 


6 

'  X '  X  B~ 

-1 

'Vy 

A 

B'  0 

K 

(3.4-M) 


If  X'X  was  already  of  full  rank,  then  B  must  be  of  rank  m  for  the 
inverse  to  exist.    If  X'X  is  of  rank  (k-m)  and  B'  consists  of  m  rows, 
then  the  indicated  inverse  will  exist  if  B  is  orthogonal  to  X'X,  i.e. 
that  (X'X)B  =  0,and  B  is  of  rank  m.    Also  if  B  is  a  combination  of 
such  an  orthogonal  set  of  restraints,  denoted  by  H,  and  the  vectors  of 
X'X,  then  the  inverse  exists  if  the  mxm  matrix  B'H  is  of  rank  m,  i.e., 
the  determinant  |B'H  |  0. 

EXAMPLE:    If  the  differences  A-B,  B-C,  C-D,  D-E,  E-A  are  measured,  then 
the  5  measurements  yi,y2» ys, y4,y5  (assumed  independent  with  equal 
variance)  can  be  represented  as 


E(y)  = 


A-B 
B-C 


-A 


■ 

"1 

-1 

0 

0 

0 

1 

-1 

0 

D 

0 

0 

1 

-1 

D-E 

0 

0 

0 

1 

+E 

-1 

0 

0 

0 

=  X3 


X'X 


2 

-1 

0 

0 

-1 

-1 

2 

-1 

0 

0 

0 

-1 

2 

-1 

0 

0 

0 

-1 

2 

-1 

-1 

0 

0 

-1 

2 

rank  of  X'X  is  4 


The  restraint  A+B+C+D+E  =[11111] 


"a" 

=  H' 

A 

B 

B 

C 

C 

D 

D 

E 

-  - 

E 

=  K 
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is  orthogonal  to  X'X  because  H'(X'X)  =(11111)  (X'X)  =  [00000]. 
If  the  given  restraint  were  A  +  B  =  Kg,  then  B'  =  (1  1  0  0  0)  and  |B'H|  = 
2  /  0  so  that  the  restraint  is  sufficient  to  produce  a  solution. 

The  standard  deviation  estimate  is  changed  from  that  given  in 
formula  (2.7)  to  become 

s  =  /^_|^^^  degrees  of  freedom  =  (n-k+m)  (3.5) 

where  m  is  the  number  of  restraints. 

Formulas  (2.8)  and  (2.9)  still  apply  for  the  standard  deviation  of  the 
parameter  values  and  of  linear  combinations  of  them. 
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4.    The  Gauss  Theorem  on  Least  Squares  (General  Case) 

If  the  observed  values  y\  yz  •  -  '       '^^^^  variances  and 

covariances  a-,  so  that 
'  J 


Var  (y)  = 


'1 


^12  ^2 


12 

2 


'In 


'2n 


^In  ^2n 


=  V 


and  the  parameters  are  subject  to  the  m  restraints 

:    r  b„6, .....  b,^B,  -  K, 


or  in  matrix  form 


(4.1-M) 


(4.2) 


B'3  =  K 

Then  the  least  squares  estimators  for  6  are  given  by 


3 

X'V^X  B 

-1 

X 

B'  0 

K 

(4.2-M) 


where  as  before  X'  =  [A-]  X2  .  .  .  X^]  is  a  vector  of  Lagrangian  multi' 
pliers  entering  into  the  minimization  process. 

For  a  discussion  of  this  general  case,  the  reader  is  referred  to  the 
Goldman-Zelen  article  [4]. 
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